|

A high-performance neuroprosthesis for speech decoding and avatar control.

Metzger, Sean L; Littlejohn, Kaylo T; Silva, Alexander B; Moses, David A; Seaton, Margaret P; Wang, Ran; Dougherty, Maximilian E; Liu, Jessie R; Wu, Peter; Berger, Michael A; Zhuravleva, Inga; Tu-Chan, Adelyn; Ganguly, Karunesh; Anumanchipalli, Gopala K; Chang, Edward F.

Nature ; 620(7976): 1037-1046, 2023 Aug.

Article En | MEDLINE | ID: mdl-37612505

Speech neuroprostheses have the potential to restore communication to people living with paralysis, but naturalistic speed and expressivity are elusive1. Here we use high-density surface recordings of the speech cortex in a clinical-trial participant with severe limb and vocal paralysis to achieve high-performance real-time decoding across three complementary speech-related output modalities: text, speech audio and facial-avatar animation. We trained and evaluated deep-learning models using neural data collected as the participant attempted to silently speak sentences. For text, we demonstrate accurate and rapid large-vocabulary decoding with a median rate of 78 words per minute and median word error rate of 25%. For speech audio, we demonstrate intelligible and rapid speech synthesis and personalization to the participant's pre-injury voice. For facial-avatar animation, we demonstrate the control of virtual orofacial movements for speech and non-speech communicative gestures. The decoders reached high performance with less than two weeks of training. Our findings introduce a multimodal speech-neuroprosthetic approach that has substantial promise to restore full, embodied communication to people living with severe paralysis.

Face , Neural Prostheses , Paralysis , Speech , Humans , Cerebral Cortex/physiology , Cerebral Cortex/physiopathology , Clinical Trials as Topic , Communication , Deep Learning , Gestures , Movement , Neural Prostheses/standards , Paralysis/physiopathology , Paralysis/rehabilitation , Vocabulary , Voice

Analyses of vocal tract cross-distance to area mapping: an investigation of a set of vowel images.

McGowan, Richard S; Jackson, Michel T-T; Berger, Michael A.

J Acoust Soc Am ; 131(1): 424-34, 2012 Jan.

Article En | MEDLINE | ID: mdl-22280604

Traditional models of mappings from midsagittal cross-distances to cross-sectional areas use only local cross-distance information. These are not the optimal models on which to base the construction of a mapping between the two domains. This can be understood because phonemic identity can affect the relation between local cross-distance and cross-sectional area. However, phonemic identity is not an appropriate independent variable for the control of an articulatory synthesizer. Two alternative approaches for constructing cross-distance to area mappings that can be used for articulatory synthesis are presented. One is a vowel height-sensitive model and the other is a non-parametric model called loess. These depend on global cross-distance information and generally perform better than the traditional models.

Pharynx/anatomy & histology , Phonetics , Speech/physiology , Vocal Cords/anatomy & histology , Analysis of Variance , Female , Humans , Magnetic Resonance Imaging , Male , Palate/anatomy & histology , Sex Characteristics , Tongue/anatomy & histology

Diffraction study of protein crystals grown in cryoloops and micromounts.

Berger, Michael A; Decker, Johannes H; Mathews, Irimpan I.

J Appl Crystallogr ; 43(Pt 6): 1513-1518, 2010 Dec 01.

Article En | MEDLINE | ID: mdl-22477781

Protein crystals are usually grown in hanging or sitting drops and generally get transferred to a loop or micromount for cryocooling and data collection. This paper describes a method for growing crystals on cryoloops for easier manipulation of the crystals for data collection. This study also investigates the steps for the automation of this process and describes the design of a new tray for the method. The diffraction patterns and the structures of three proteins grown by both the new method and the conventional hanging-drop method are compared. The new setup is optimized for the automation of the crystal mounting process. Researchers could prepare nanolitre drops under ordinary laboratory conditions by growing the crystals directly in loops or micromounts. As has been pointed out before, higher levels of supersaturation can be obtained in very small volumes, and the new method may help in the exploration of additional crystallization conditions.

Acoustic-articulatory mapping in vowels by locally weighted regression.

McGowan, Richard S; Berger, Michael A.

J Acoust Soc Am ; 126(4): 2011-32, 2009 Oct.

Article En | MEDLINE | ID: mdl-19813812

A method for mapping between simultaneously measured articulatory and acoustic data is proposed. The method uses principal components analysis on the articulatory and acoustic variables, and mapping between the domains by locally weighted linear regression, or loess [Cleveland, W. S. (1979). J. Am. Stat. Assoc. 74, 829-836]. The latter method permits local variation in the slopes of the linear regression, assuming that the function being approximated is smooth. The methodology is applied to vowels of four speakers in the Wisconsin X-ray Microbeam Speech Production Database, with formant analysis. Results are examined in terms of (1) examples of forward (articulation-to-acoustics) mappings and inverse mappings, (2) distributions of local slopes and constants, (3) examples of correlations among slopes and constants, (4) root-mean-square error, and (5) sensitivity of formant frequencies to articulatory change. It is shown that the results are qualitatively correct and that loess performs better than global regression. The forward mappings show different root-mean-square error properties than the inverse mappings indicating that this method is better suited for the forward mappings than the inverse mappings, at least for the data chosen for the current study. Some preliminary results on sensitivity of the first two formant frequencies to the two most important articulatory principal components are presented.

Models, Statistical , Phonetics , Speech Acoustics , Speech , Algorithms , Biomechanical Phenomena , Female , Humans , Linear Models , Lip/physiology , Male , Principal Component Analysis , Regression Analysis , Speech/physiology , Tongue/physiology